Calculating the Mean Squared Error
for a SKATER Model
Euclidean dissimilarity, \(d\):
For two polygons, \(p_i\) and \(p_j\), the Euclidean distance in high-dimensional attribute space (for variables \(1\) to \(n\)) is: \[\begin{equation}
d(p_{i},p_{j}) = \sqrt{(p_{i1} - p_{j1})^2 + (p_{i2} - p_{j2})^2 + ... + (p_{in} - p_{jn})^2}
\end{equation}\]
Mean intraregion dissimilarity, \(D\):
For a region \(R\) with \(m\) polygons (\(p\)), add up all the pairwise Euclidean dissimilarities for the region and divide by the number of polygons in the region: \[\begin{equation}
D(R) =
\frac{\sum\limits_{i=1}^{m-1}\sum\limits_{j=i+1}^m d(p_{i},p_{j})}{m}
\end{equation}\]
Error, \(E\):
For a region, \(R\), with \(m\) polygons (\(p\)), the error, \(E\), is found by taking the sum of each pairwise Euclidean distance (\(d_{ij}\)) minus the mean intraregion dissimilarity (\(D\)): \[\begin{equation}
E(R) = \sum\limits_{i=1}^{m-1}\sum\limits_{j=i+1}^m \Big(d(p_{i},p_{j}) - D(R)\Big)
\end{equation}\]
Mean squared error, \(MSE\):
For a SKATER model with \(K\) regions (\(R\)), the mean squared error, \(MSE\), is the average squared error for all the regions in the model: \[\begin{equation}
MSE(K) =
\frac{\sum\limits_{k=1}^K E(R_k)^2}{K}
\end{equation}\]
---
title: "Proportional Improvement in Spatial MSE: Finding an Optimal K"
author: "Erin M. Ochoa"
date: "04/08/2019"
output:
flexdashboard::flex_dashboard:
storyboard: true
theme: flatly
source_code: embed
mathjax: https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-MML-AM_CHTML
---
```{r setup}
library(sf)
library(sp)
library(spdep)
library(rgdal)
library(leaflet)
library(mapview)
library(htmltools)
library(tidyverse)
library(RColorBrewer)
```
```{r style_mathjax}
# Use MathJax; tweak the nav bar; define styles for text slides
# https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-MML-AM_CHTML
```
### Terms Defined {data-commentary-width=0}
Calculating the Mean Squared Error
for a SKATER Model
Euclidean dissimilarity, $d$:
For two polygons, $p_i$ and $p_j$, the Euclidean distance in high-dimensional attribute space (for variables $1$ to $n$) is:
$$\begin{equation}
d(p_{i},p_{j}) = \sqrt{(p_{i1} - p_{j1})^2 + (p_{i2} - p_{j2})^2 + ... + (p_{in} - p_{jn})^2}
\end{equation}$$
Mean intraregion dissimilarity, $D$:
For a region $R$ with $m$ polygons ($p$), add up all the pairwise Euclidean dissimilarities for the region and divide by the number of polygons in the region:
$$\begin{equation}
D(R) =
\frac{\sum\limits_{i=1}^{m-1}\sum\limits_{j=i+1}^m d(p_{i},p_{j})}{m}
\end{equation}$$
Error, $E$:
For a region, $R$, with $m$ polygons ($p$), the error, $E$, is found by taking the sum of each pairwise Euclidean distance ($d_{ij}$) minus the mean intraregion dissimilarity ($D$):
$$\begin{equation}
E(R) = \sum\limits_{i=1}^{m-1}\sum\limits_{j=i+1}^m \Big(d(p_{i},p_{j}) - D(R)\Big)
\end{equation}$$
Mean squared error, $MSE$:
For a SKATER model with $K$ regions ($R$), the mean squared error, $MSE$, is the average squared error for all the regions in the model:
$$\begin{equation}
MSE(K) =
\frac{\sum\limits_{k=1}^K E(R_k)^2}{K}
\end{equation}$$